Getting Started with R(Studio)

Data Carpentry for Social Sciences and Humanities

2024-09-30

Live notes

https://edu.nl/7fdgj

Why R?

It’s a programming language/software that is FREE and open source! 🎉

It was created by statisticians for statistics 📊

Because it’s FREE and open source and works with scripts, it’s great for reproducibility 💪

Did I mention FREE?

Ok… So why RStudio?

RStudio is an integrated development environment (IDE)

It’s essentially a (much prettier) wrapper for the R software

R is integrated into RStudio, so you never actually have to open R, which is a good thing…

Let’s take the tour

Organising working directory

Basics of R

R is a language spoken by the R software

Software is not good at interpreting things

So we need to learn it’s language to communicate EXACTLY what we want

But unlike with a natural language, knowing a few R ‘words’ and ‘phrases’ can take us really far!

It’s a beginning of a journey! 🧳

Things to look forward to:

Things to look forward to:

percent_items %>%
    ggplot(aes(x = village, y = percent, fill = items)) +
    geom_bar(stat = "identity", position = "dodge") +
    facet_wrap(~ items) +
    theme_bw() +
    theme(panel.grid = element_blank(),
          legend.position = "none")

Things to look forward to:

Artwork by @allison_horst

Naming conventions

Characters, numbers, _ and . allowed

Names cannot start with a number

Use of . not recommended

Some names not allowed (e.g. if, else)

Others allowed, but not recommended (e.g. weights, df, mean)

Case sensitive: Age is different from age

Stick to the same convention

Exercise 1

3 mins

Create two variables r_length and r_width and assign them values.

Create a third variable r_area and give it a value by multiplying r_length and r_width.

03:00

Solution

r_length <- 6
r_width <- 7
r_area <- r_length * r_width

Exercise 2

4 mins

Type ?round into the console to open the help page for the round() function.

Find the appropriate function to round 1.624 down to the nearest integer.

04:00

Solution

floor(1.624)
[1] 1

Data Types

numeric (e.g. 3.14)

character (e.g. 'car')

integer (e.g. 50L)

logical ( TRUE or FALSE)

complex

raw

Exercise 3

8 mins

What will happen in each of the examples below?

💡 Hint: use typeof() to check the data type of your objects

num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2L, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")

Why does this happen?

08:00

Solution

Vectors can only contain a single data type.

R converts to a common denominator that loses as little information as possible.

character < double < integer < logical

num_char
[1] "1" "2" "3" "a"
num_logical
[1] 1 2 3 1
char_logical
[1] "a"    "b"    "c"    "TRUE"
tricky
[1] "1" "2" "3" "4"

Exercise 4

4 mins

How many values in combined_logical are "TRUE" (as a string)?

num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
combined_logical <- c(num_logical, char_logical)

04:00

Solution

combined_logical
[1] "1"    "2"    "3"    "1"    "a"    "b"    "c"    "TRUE"

The TRUE in num_logical gets converted to 1, and then "1" when combined with char_logical.

Exercise 5

8 mins

  1. Using this vector of rooms, create a new vector with the NAs removed:
rooms <- c(1, 2, 1, NA, 3, 1, 3, 2, 8, NA, 1)
  1. then calculate the median.

  2. Use R to calculate how many elements of rooms are larger than 2.

08:00

Solution

# 1
rooms_no_na <- rooms[!is.na(rooms)]
# or
rooms_no_na <- na.omit(rooms)

# 2
median(rooms, na.rm = TRUE) # or median(rooms_no_na)
[1] 2
# 3
rooms_above_2 <- rooms_no_na[rooms_no_na > 2]
length(rooms_above_2)
[1] 3